Enriching Knowledge Sources for Natural Language Understanding

نویسنده

  • Egoitz Laparra
چکیده

This paper presents the complete and consistent ontological annotation of the nominal part of WordNet. The annotation has been carried out using the semantic features defined in the EuroWordNet Top Concept Ontology and made available to the NLP community. Up to now only an initial core set of 1,024 synsets, the so-called Base Concepts, was ontologized in such a way. The work has been achieved by following a methodology based on an iterative and incremental expansion of the initial labeling through the hierarchy while setting inheritance blockage points. Since this labeling has been set on the EuroWordNet’s Interlingual Index (ILI), it can be also used to populate any other wordnet linked to it through a simple porting process. This feature-annotated WordNet is intended to be useful for a large number of semantic NLP tasks and for testing for the first time componential analysis on real environments. Moreover, the quantitative analysis of the work shows that more than 40% of the nominal part of WordNet is involved in structure errors or inadequacies The methodology followed for annotating the ILI with the TCO is based on the common assumption that hyponymy corresponds to feature set inclusion (Cruse, 2002) and in the observation that, since wordnets are taken to be crucially structured by hyponymy it is possible to create a rich consistent semantic lexicon inheriting basic features through the hyponymy relations [10]. This methodology confronts two main drawbacks, the hyponymy hierarchy of wordnet is not consistent (Guarino 1998) and there may be multiple inherintace. In the EuroWordNet project TCO features were asigned to this Basic Concepts. These are genereal concepts that can represent other concepts that are behind them in the hyponymy hierachy, but they do not cover all wordnet. Thus, firs of all, we annotated the gaps of the hierarchy asigning TCO features to the Semantic Files of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

To Appear Coling-acl '98: Workshop on Usage of Wordnet in Natural Language Processing Systems Incorporating Knowledge in Natural Language Learning: a Case Study

Incorporating external information during a learning process is expected to improve its eeciency. We study a method for incorporating noun-class information , in the context of learning to resolve Prepo-sitional Phrase Attachment (PPA) disambiguation. This is done within a recently introduced architecture , SNOW, a sparse network of threshold gates utilizing the Winnow learning algorithm. That ...

متن کامل

The Links Have It: Infobox Generation by Summarization over Linked Entities

Online encyclopedia such as Wikipedia has become one of the best sources of knowledge. Much effort has been devoted to expanding and enriching the structured data by automatic information extraction from unstructured text in Wikipedia. Although remarkable progresses have been made, their effectiveness and efficiency is still limited as they try to tackle an extremely difficult natural language ...

متن کامل

Studying the History of Pre-Modern Zoology with Linked Data and Vocabularies

In this paper we first present the international multidisciplinary research network Zoomathia, which aims the study of the transmission of zoological knowledge from Antiquity to Middle Ages through varied resources, and considers especially textual information, including compilation literature such as encyclopaedias. We then present a preliminary work in the context of Zoomathia consisting in (...

متن کامل

Natural Language Processing for Cultural Heritage Domains

Museums, archives, libraries and other cultural heritage institutes maintain large collections of artefacts which are valuable knowledge sources for both experts and interested lay persons. Recently, more and more cultural heritage institutes have started to digitise their collections, for instance to make them accessible via web portals. However, while digitisation is a necessary first step to...

متن کامل

Extracting Lexical-Semantic Knowledge from the Portuguese Wiktionary

Public domain collaborative resources like Wiktionary and Wikipedia have recently become attractive sources for information extraction. To use these resources in natural languague processing (NLP) tasks, efficient programmatic access to their contents is required. In this work, we have extracted semantic relations automatically from the Portuguese Wiktionary and compared our results with the re...

متن کامل

Utilizing, Creating and Publishing Linked Open Data with the Thesaurus Management Tool PoolParty

We introduce the Thesaurus Management Tool (TMT) PoolParty based on Semantic Web standards that reduces the effort to create and maintain thesauri by utilizing Linked Open Data (LOD), text-analysis and easy-to-use GUIs. PoolParty’s aim is to lower the access barriers to managing thesauri, so domain experts can contribute to thesaurus creation without needing knowledge about the Semantic Web. A ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009